Method:

ACF: The autocorrelation function (ACF) defines how data points in a time series are related, on average, to the preceding data points (Box, Jenkins, & Reinsel, 1994). In other words, it measures the self-similarity of the signal over different delay times. Accordingly, the ACF is a function of the delay or lag Ï„, which determines the time shift taken into the past to estimate the similarity between data points. https://en.wikipedia.org/wiki/Autocorrelation

PACF: The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. https://en.wikipedia.org/wiki/Partial_autocorrelation_function

CCF:

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other.

Alt Text

Alt Text https://en.wikipedia.org/wiki/Cross-correlation

Granger causality

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Let y and x be stationary time series. To test the null hypothesis that x does not Granger-cause y, one first finds the proper lagged values of y to include in an univariate autoregression of y:

\[{\displaystyle y_{t}=a_{0}+a_{1}y_{t-1}+a_{2}y_{t-2}+\cdots +a_{m}y_{t-m}+{\text{error}}_{t}.}\] Next, the autoregression is augmented by including lagged values of x:

\[{\displaystyle y_{t}=a_{0}+a_{1}y_{t-1}+a_{2}y_{t-2}+\cdots +a_{m}y_{t-m}+b_{p}x_{t-p}+\cdots +b_{q}x_{t-q}+{\text{error}}_{t}.}\]

One retains in this regression all lagged values of x that are individually significant according to their t-statistics, provided that collectively they add explanatory power to the regression according to an F-test (whose null hypothesis is no explanatory power jointly added by the x’s). In the notation of the above augmented regression, p is the shortest, and q is the longest, lag length for which the lagged value of x is significant.

The null hypothesis that x does not Granger-cause y is accepted if and only if no lagged values of x are retained in the regression.

https://en.wikipedia.org/wiki/Granger_causality

#Update (May 1) Consistent date: March 29 ~ Apr 24 # of complete sensors: 11

One example (126853)

outlier clean

Rule: Find and replace the indoor air if it exceeds the max ever observed of outdoor air in the time window (6 hours).

## [1] "outlier %: 1003/3882 = 25.84%"

Concern 1: outlier % is high

Compare before and after outlier removal

Cross Correlation Function

Granger causality test

## [1] "For cleaned data, the best lag: 27 (4.5 hours)"

Granger causality test results:

For raw data:

## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
##   Res.Df Df      F   Pr(>F)   
## 1   3875                      
## 2   3877 -2 6.5785 0.001406 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For clean data:

## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:27) + Lags(AQI_o, 1:27)
## Model 2: AQI_i ~ Lags(AQI_i, 1:27)
##   Res.Df  Df      F    Pr(>F)    
## 1   3800                         
## 2   3827 -27 9.0202 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Placebo Test

For clean data, design a Placebo Test

Model 0: AQI_i ~ Lags(AQI_i, 1:27)

experimental group (4.5 hours outdoor AQI): Model 1: AQI_i ~ Lags(AQI_i, 1:27) + Lags(AQI_o, 1:27)

Placebo group (4.5 + \(t\) hours outdoor AQI): Model 2: AQI_i ~ Lags(AQI_i, 1:27) + Lags(AQI_o, 1+t:27+t)

Here, \(t\) can be 24*6 (1 day), 48*6 (2 days), …, 24*7*6 (1 week). This means use 4.5 hours \(t\) period ago as one of the input to predict the indoor AQI, which does not make sense physically. (high P-value is expected to be seen)

make hypothesis test between Model 0 and Model 1; make hypothesis test between Model 0 and Model 2; Compare the degree of significance (P-value)

Conclusion: P-values of experimental group are lower than placebo group as expected.

Summary 11 complete participants

## # A tibble: 11 x 8
##    participant `Valid records` `outlier %` `time_lags (mins)` `time_lags (hours~
##          <dbl>           <dbl>       <dbl>              <dbl>              <dbl>
##  1      126853            3882        25.8                270              4.5  
##  2      125627            3860        65.7                 80              1.33 
##  3      126603            3808        76.3                200              3.33 
##  4      127177            3888        40.7                 10              0.167
##  5      127183            3829        53.3                130              2.17 
##  6      127187            3874        43.6                 10              0.167
##  7      127213            3888        38.3                310              5.17 
##  8      127221            3857        30.0                 30              0.5  
##  9      127227            3887        56.5                 10              0.167
## 10      127305            2327        30.6                 30              0.5  
## 11      127303            3882        54.1                 20              0.333
## # ... with 3 more variables: P <dbl>, score <dbl>, Resistance <dbl>

\[Score\ =\ -log(P\_value)\] \[Resistance\ =\ \frac{Score\ -min(Score)}{max(Score)\ -\ min(Score)}\]

Detailed 10 complete participants

125627

## [1] "outlier %: 2536/3860 = 65.7%"
## [1] "For cleaned data, the best lag: 8 (1.33 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
##   Res.Df Df      F Pr(>F)
## 1   3853                 
## 2   3855 -2 0.3296 0.7192
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:8) + Lags(AQI_o, 1:8)
## Model 2: AQI_i ~ Lags(AQI_i, 1:8)
##   Res.Df Df      F   Pr(>F)   
## 1   3835                      
## 2   3843 -8 2.6217 0.007327 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 0.007327019

126603

## [1] "outlier %: 2906/3808 = 76.31%"
## [1] "For cleaned data, the best lag: 20 (3.33 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:1) + Lags(AQI_o, 1:1)
## Model 2: AQI_i ~ Lags(AQI_i, 1:1)
##   Res.Df Df      F   Pr(>F)   
## 1   3804                      
## 2   3805 -1 6.7565 0.009377 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:20) + Lags(AQI_o, 1:20)
## Model 2: AQI_i ~ Lags(AQI_i, 1:20)
##   Res.Df  Df   F    Pr(>F)    
## 1   3747                      
## 2   3767 -20 110 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 0

127177

## [1] "outlier %: 1583/3888 = 40.72%"
## [1] "For cleaned data, the best lag: 1 (0.17 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
##   Res.Df Df      F    Pr(>F)    
## 1   3881                        
## 2   3883 -2 22.113 2.822e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:1) + Lags(AQI_o, 1:1)
## Model 2: AQI_i ~ Lags(AQI_i, 1:1)
##   Res.Df Df      F    Pr(>F)    
## 1   3884                        
## 2   3885 -1 102.94 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 6.826752e-24

127183

## [1] "outlier %: 2042/3829 = 53.33%"
## [1] "For cleaned data, the best lag: 13 (2.17 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
##   Res.Df Df      F Pr(>F)
## 1   3822                 
## 2   3824 -2 1.3506 0.2592
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:13) + Lags(AQI_o, 1:13)
## Model 2: AQI_i ~ Lags(AQI_i, 1:13)
##   Res.Df  Df      F    Pr(>F)    
## 1   3789                         
## 2   3802 -13 10.191 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 1.737408e-21

127187

## [1] "outlier %: 1688/3874 = 43.57%"
## [1] "For cleaned data, the best lag: 1 (0.17 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
##   Res.Df Df      F    Pr(>F)    
## 1   3867                        
## 2   3869 -2 18.709 8.197e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:1) + Lags(AQI_o, 1:1)
## Model 2: AQI_i ~ Lags(AQI_i, 1:1)
##   Res.Df Df      F    Pr(>F)    
## 1   3870                        
## 2   3871 -1 94.896 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 3.599921e-22

127213

## [1] "outlier %: 1488/3888 = 38.27%"
## [1] "For cleaned data, the best lag: 31 (5.17 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:21) + Lags(AQI_o, 1:21)
## Model 2: AQI_i ~ Lags(AQI_i, 1:21)
##   Res.Df  Df      F    Pr(>F)    
## 1   3824                         
## 2   3845 -21 2.4063 0.0003285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:31) + Lags(AQI_o, 1:31)
## Model 2: AQI_i ~ Lags(AQI_i, 1:31)
##   Res.Df  Df      F    Pr(>F)    
## 1   3794                         
## 2   3825 -31 7.7704 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 4.226207e-33

127221

## [1] "outlier %: 1159/3857 = 30.05%"
## [1] "For cleaned data, the best lag: 3 (0.5 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
##   Res.Df Df      F    Pr(>F)    
## 1   3850                        
## 2   3852 -2 10.199 3.821e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:3) + Lags(AQI_o, 1:3)
## Model 2: AQI_i ~ Lags(AQI_i, 1:3)
##   Res.Df Df      F    Pr(>F)    
## 1   3847                        
## 2   3850 -3 8.2467 1.815e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 1.815044e-05

127227

## [1] "outlier %: 2197/3887 = 56.52%"
## [1] "For cleaned data, the best lag: 1 (0.17 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:1) + Lags(AQI_o, 1:1)
## Model 2: AQI_i ~ Lags(AQI_i, 1:1)
##   Res.Df Df      F    Pr(>F)    
## 1   3883                        
## 2   3884 -1 47.224 7.346e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:1) + Lags(AQI_o, 1:1)
## Model 2: AQI_i ~ Lags(AQI_i, 1:1)
##   Res.Df Df      F    Pr(>F)    
## 1   3883                        
## 2   3884 -1 130.33 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 1.0281e-29

127305

## [1] "outlier %: 713/2327 = 30.64%"
## [1] "For cleaned data, the best lag: 3 (0.5 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
##   Res.Df Df     F   Pr(>F)   
## 1   2320                     
## 2   2322 -2 5.465 0.004287 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:3) + Lags(AQI_o, 1:3)
## Model 2: AQI_i ~ Lags(AQI_i, 1:3)
##   Res.Df Df      F    Pr(>F)    
## 1   2317                        
## 2   2320 -3 19.319 2.266e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 2.266182e-12

127303

## [1] "outlier %: 2101/3882 = 54.12%"
## [1] "For cleaned data, the best lag: 2 (0.33 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
##   Res.Df Df      F    Pr(>F)    
## 1   3875                        
## 2   3877 -2 12.097 5.792e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
##   Res.Df Df      F    Pr(>F)    
## 1   3875                        
## 2   3877 -2 36.897 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 1.338759e-16

126853

## [1] "outlier %: 1003/3882 = 25.84%"
## [1] "For cleaned data, the best lag: 27 (4.5 hours)"
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:2) + Lags(AQI_o, 1:2)
## Model 2: AQI_i ~ Lags(AQI_i, 1:2)
##   Res.Df Df      F   Pr(>F)   
## 1   3875                      
## 2   3877 -2 6.5785 0.001406 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Granger causality test
## 
## Model 1: AQI_i ~ Lags(AQI_i, 1:27) + Lags(AQI_o, 1:27)
## Model 2: AQI_i ~ Lags(AQI_i, 1:27)
##   Res.Df  Df      F    Pr(>F)    
## 1   3800                         
## 2   3827 -27 9.0202 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 2.0204e-35